Ordinary Web Pages as a Source for Metadata Acquisition for Open Corpus User Modeling
نویسندگان
چکیده
Personalization and adaptivity of the Web as we know of today is often “closed” within a particular web-based system. As a result there are only a few “personalized islands” within the whole Web. Spreading the personalization to the whole Web either via an enhanced proxy server or using an agent residing on a client-side brings a challenge how to determine metadata within an open corpus Web domain, which would allow for an efficient creation of overlayed user model. In this paper we present our approach to metadata acquisition for open corpus user modeling applicable on the “wild” Web, where we decided to take into account metadata in the form of keywords representing the visited web pages. We present the user modeling process (which is thus keyword-based) built on the top of an enhanced proxy server, capable of personalizing user browsing sessions via pluggable modules. The paper focuses on comparison of algorithms and thirdparty services which allow for extraction of required keywords from ordinary web pages, which is a crucial step of our user modeling approach.
منابع مشابه
انطباق عناصر فرادادۀ وبسایت کتابخانههای مرکزی دانشگاههای علوم پزشکی با عناصر فرادادۀ هسته دوبلین
Introduction: Considering the importance of library websites in the establishment of communication and provision of services for their users, it is crucial to include those features in these websites which can lead to increased dynamism and optimal communication. The present study aimed at comparing Metadata elements of Dublin Core with those of the websites of Central Libraries of Medical Univ...
متن کاملتشخیص ناهنجاری روی وب از طریق ایجاد پروفایل کاربرد دسترسی
Due to increasing in cyber-attacks, the need for web servers attack detection technique has drawn attentions today. Unfortunately, many available security solutions are inefficient in identifying web-based attacks. The main aim of this study is to detect abnormal web navigations based on web usage profiles. In this paper, comparing scrolling behavior of a normal user with an attacker, and simu...
متن کاملRIDIRE-CPI: an Open Source Crawling and Processing Infrastructure for Web Corpora Building
This paper introduces the RIDIRE-CPI, an open source tool for the building of web corpora with a specific design through a targeted crawling strategy. The tool has been developed within the RIDIRE Project, which aims at creating a 2 billion word balanced web corpus for Italian. RIDIRE-CPI architecture integrates existing open source tools as well as modules developed specifically within the RID...
متن کاملGrass-roots Semantic Web Tools
One of the biggest challenges of the Semantic Web is to make its tools usable by ordinary users for grass-roots production and integration of semantic information. This paper introduces the ongoing research on this issue in our research group at the Information Sciences Institute. 1. RESEARCH OVERVIEW Despite years of intense work and research on the Semantic Web, it has not become a reality. O...
متن کاملUse of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems
One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010